223 research outputs found
Twenty Years of Student Scholarship: Celebrating the Dalhousie Journal of Legal Studies
In this paper, we propose a novel controllable text-to-image generative
adversarial network (ControlGAN), which can effectively synthesise high-quality
images and also control parts of the image generation according to natural
language descriptions. To achieve this, we introduce a word-level spatial and
channel-wise attention-driven generator that can disentangle different visual
attributes, and allow the model to focus on generating and manipulating
subregions corresponding to the most relevant words. Also, a word-level
discriminator is proposed to provide fine-grained supervisory feedback by
correlating words with image regions, facilitating training an effective
generator which is able to manipulate specific visual attributes without
affecting the generation of other content. Furthermore, perceptual loss is
adopted to reduce the randomness involved in the image generation, and to
encourage the generator to manipulate specific attributes required in the
modified text. Extensive experiments on benchmark datasets demonstrate that our
method outperforms existing state of the art, and is able to effectively
manipulate synthetic images using natural language descriptions. Code is
available at https://github.com/mrlibw/ControlGAN.Comment: NeurIPS 201
EC Agricultural Prices. Price Indices and absolute prices-Quarterly Statistics 1-1993
We propose MAD-GAN, an intuitive generalization to the Generative Adversarial Networks (GANs) and its conditional variants to address the well known problem of mode collapse. First, MAD-GAN is a multi-agent GAN architecture incorporating multiple generators and one discriminator. Second, to enforce that different generators capture diverse high probability modes, the discriminator of MAD-GAN is designed such that along with finding the real and fake samples, it is also required to identify the generator that generated the given fake sample. Intuitively, to succeed in this task, the discriminator must learn to push different generators towards different identifiable modes. We perform extensive experiments on synthetic and real datasets and compare MAD-GAN with different variants of GAN. We show high quality diverse sample generations for challenging tasks such as image-to-image translation and face generation. In addition, we also show that MAD-GAN is able to disentangle different modalities when trained using highly challenging diverse-class dataset (e.g. dataset with images of forests, icebergs, and bedrooms). In the end, we show its efficacy on the unsupervised feature representation task
Diagnosing and Preventing Instabilities in Recurrent Video Processing.
Recurrent models are a popular choice for video enhancement tasks such as video denoising or super-resolution. In this work, we focus on their stability as dynamical systems and show that they tend to fail catastrophically at inference time on long video sequences. To address this issue, we (1) introduce a diagnostic tool which produces input sequences optimized to trigger instabilities and that can be interpreted as visualizations of temporal receptive fields, and (2) propose two approaches to enforce the stability of a model during training: constraining the spectral norm or constraining the stable rank of its convolutional layers. We then introduce Stable Rank Normalization for Convolutional layers (SRN-C), a new algorithm that enforces these constraints. Our experimental results suggest that SRN-C successfully enforces stablility in recurrent video processing models without a significant performance loss
Automatic cattle identification using graph matching based on local invariant features
Cattle muzzle classification can be considered as a biometric identifier important to animal traceability systems to ensure the integrity of the food chain. This paper presents a muzzle-based classification system that combines local invariant features with graph matching. The proposed approach consists of three phases; namely feature extraction, graph matching, and matching refinement. The experimental results showed that our approach is superior than existing works as ours achieves an all correct identification for the tested images. In addition, the results proved that our proposed method achieved this high accuracy even if the testing images are rotated in various angles.info:eu-repo/semantics/publishedVersio
lp-Recovery of the Most Significant Subspace among Multiple Subspaces with Outliers
We assume data sampled from a mixture of d-dimensional linear subspaces with
spherically symmetric distributions within each subspace and an additional
outlier component with spherically symmetric distribution within the ambient
space (for simplicity we may assume that all distributions are uniform on their
corresponding unit spheres). We also assume mixture weights for the different
components. We say that one of the underlying subspaces of the model is most
significant if its mixture weight is higher than the sum of the mixture weights
of all other subspaces. We study the recovery of the most significant subspace
by minimizing the lp-averaged distances of data points from d-dimensional
subspaces, where p>0. Unlike other lp minimization problems, this minimization
is non-convex for all p>0 and thus requires different methods for its analysis.
We show that if 0<p<=1, then for any fraction of outliers the most significant
subspace can be recovered by lp minimization with overwhelming probability
(which depends on the generating distribution and its parameters). We show that
when adding small noise around the underlying subspaces the most significant
subspace can be nearly recovered by lp minimization for any 0<p<=1 with an
error proportional to the noise level. On the other hand, if p>1 and there is
more than one underlying subspace, then with overwhelming probability the most
significant subspace cannot be recovered or nearly recovered. This last result
does not require spherically symmetric outliers.Comment: This is a revised version of the part of 1002.1994 that deals with
single subspace recovery. V3: Improved estimates (in particular for Lemma 3.1
and for estimates relying on it), asymptotic dependence of probabilities and
constants on D and d and further clarifications; for simplicity it assumes
uniform distributions on spheres. V4: minor revision for the published
versio
Relative Pose from Deep Learned Depth and a Single Affine Correspondence
We propose a new approach for combining deep-learned non-metric monocular
depth with affine correspondences (ACs) to estimate the relative pose of two
calibrated cameras from a single correspondence. Considering the depth
information and affine features, two new constraints on the camera pose are
derived. The proposed solver is usable within 1-point RANSAC approaches. Thus,
the processing time of the robust estimation is linear in the number of
correspondences and, therefore, orders of magnitude faster than by using
traditional approaches. The proposed 1AC+D solver is tested both on synthetic
data and on 110395 publicly available real image pairs where we used an
off-the-shelf monocular depth network to provide up-to-scale depth per pixel.
The proposed 1AC+D leads to similar accuracy as traditional approaches while
being significantly faster. When solving large-scale problems, e.g., pose-graph
initialization for Structure-from-Motion (SfM) pipelines, the overhead of
obtaining ACs and monocular depth is negligible compared to the speed-up gained
in the pairwise geometric verification, i.e., relative pose estimation. This is
demonstrated on scenes from the 1DSfM dataset using a state-of-the-art global
SfM algorithm. Source code: https://github.com/eivan/one-ac-pos
Bottom-up Instance Segmentation using Deep Higher-Order CRFs
Traditional Scene Understanding problems such as Object Detection and Semantic Segmentation have made breakthroughs in recent years due to the adoption of deep learning. However, the former task is not able to localise objects at a pixel level, and the latter task has no notion of different instances of objects of the same class. We focus on the task of Instance Segmentation which recognises and localises objects down to a pixel level. Our model is based on a deep neural network trained for semantic segmentation. This network incorporates a Conditional Random Field with end-to-end trainable higher order potentials based on object detector outputs. This allows us to reason about instances from an initial, category-level semantic segmentation. Our simple method effectively leverages the great progress recently made in semantic segmentation and object detection. The accurate instance-level segmentations that our network produces is reflected by the considerable improvements obtained over previous work at high APr IoU thresholds
Pixelwise instance segmentation with a dynamically instantiated network
Semantic segmentation and object detection research have recently achieved rapid progress. However, the former task has no notion of different instances of the same object, and the latter operates at a coarse, bounding-box level. We propose an Instance Segmentation system that produces a segmentation map where each pixel is assigned an object class and instance identity label. Most approaches adapt object detectors to produce segments instead of boxes. In contrast, our method is based on an initial semantic segmentation module, which feeds into an instance subnetwork. This subnetwork uses the initial category-level segmentation, along with cues from the output of an object detector, within an end-to-end CRF to predict instances. This part of our model is dynamically instantiated to produce a variable number of instances per image. Our end-to-end approach requires no post-processing and considers the image holistically, instead of processing independent proposals. Therefore, unlike some related work, a pixel cannot belong to multiple instances. Furthermore, far more precise segmentations are achieved, as shown by our substantial improvements at high APr thresholds
Bottom-up Instance Segmentation using Deep Higher-Order CRFs
Traditional Scene Understanding problems such as Object Detection and Semantic Segmentation have made breakthroughs in recent years due to the adoption of deep learning. However, the former task is not able to localise objects at a pixel level, and the latter task has no notion of different instances of objects of the same class. We focus on the task of Instance Segmentation which recognises and localises objects down to a pixel level. Our model is based on a deep neural network trained for semantic segmentation. This network incorporates a Conditional Random Field with end-to-end trainable higher order potentials based on object detector outputs. This allows us to reason about instances from an initial, category-level semantic segmentation. Our simple method effectively leverages the great progress recently made in semantic segmentation and object detection. The accurate instance-level segmentations that our network produces is reflected by the considerable improvements obtained over previous work at high APr IoU thresholds
- …